111 research outputs found
Divergence of the ADAM algorithm with fixed-stepsize: a (very) simple example
A very simple unidimensional function with Lipschitz continuous gradient is
constructed such that the ADAM algorithm with constant stepsize, started from
the origin, diverges when applied to minimize this function in the absence of
noise on the gradient. Divergence occurs irrespective of the choice of the
method parameters
Adaptive Regularization for Nonconvex Optimization Using Inexact Function Values and Randomly Perturbed Derivatives
A regularization algorithm allowing random noise in derivatives and inexact
function values is proposed for computing approximate local critical points of
any order for smooth unconstrained optimization problems. For an objective
function with Lipschitz continuous -th derivative and given an arbitrary
optimality order , it is shown that this algorithm will, in
expectation, compute such a point in at most
inexact evaluations of and its derivatives whenever , where
is the tolerance for th order accuracy. This bound becomes at
most
inexact evaluations if and all derivatives are Lipschitz continuous.
Moreover these bounds are sharp in the order of the accuracy tolerances. An
extension to convexly constrained problems is also outlined.Comment: 22 page
Adaptive Regularization Algorithms with Inexact Evaluations for Nonconvex Optimization
A regularization algorithm using inexact function values and inexact
derivatives is proposed and its evaluation complexity analyzed. This algorithm
is applicable to unconstrained problems and to problems with inexpensive
constraints (that is constraints whose evaluation and enforcement has
negligible cost) under the assumption that the derivative of highest degree is
-H\"{o}lder continuous. It features a very flexible adaptive mechanism
for determining the inexactness which is allowed, at each iteration, when
computing objective function values and derivatives. The complexity analysis
covers arbitrary optimality order and arbitrary degree of available approximate
derivatives. It extends results of Cartis, Gould and Toint (2018) on the
evaluation complexity to the inexact case: if a th order minimizer is sought
using approximations to the first derivatives, it is proved that a suitable
approximate minimizer within is computed by the proposed algorithm
in at most iterations and at most
approximate
evaluations. An algorithmic variant, although more rigid in practice, can be
proved to find such an approximate minimizer in
evaluations.While
the proposed framework remains so far conceptual for high degrees and orders,
it is shown to yield simple and computationally realistic inexact methods when
specialized to the unconstrained and bound-constrained first- and second-order
cases. The deterministic complexity results are finally extended to the
stochastic context, yielding adaptive sample-size rules for subsampling methods
typical of machine learning.Comment: 32 page
OFFO minimization algorithms for second-order optimality and their complexity
An Adagrad-inspired class of algorithms for smooth unconstrained optimization
is presented in which the objective function is never evaluated and yet the
gradient norms decrease at least as fast as \calO(1/\sqrt{k+1}) while
second-order optimality measures converge to zero at least as fast as
\calO(1/(k+1)^{1/3}). This latter rate of convergence is shown to be
essentially sharp and is identical to that known for more standard algorithms
(like trust-region or adaptive-regularization methods) using both function and
derivatives' evaluations. A related "divergent stepsize" method is also
described, whose essentially sharp rate of convergence is slighly inferior. It
is finally discussed how to obtain weaker second-order optimality guarantees at
a (much) reduced computional cost
- β¦